Will Big Data Close the Missing Heritability Gap?

نویسندگان

  • Hwasoon Kim
  • Alexander Grueneberg
  • Ana I Vazquez
  • Stephen Hsu
  • Gustavo de Los Campos
چکیده

Despite the important discoveries reported by genome-wide association (GWA) studies, for most traits and diseases the prediction R-squared (R-sq.) achieved with genetic scores remains considerably lower than the trait heritability. Modern biobanks will soon deliver unprecedentedly large biomedical data sets: Will the advent of big data close the gap between the trait heritability and the proportion of variance that can be explained by a genomic predictor? We addressed this question using Bayesian methods and a data analysis approach that produces a surface response relating prediction R-sq. with sample size and model complexity (e.g., number of SNPs). We applied the methodology to data from the interim release of the UK Biobank. Focusing on human height as a model trait and using 80,000 records for model training, we achieved a prediction R-sq. in testing (n = 22,221) of 0.24 (95% C.I.: 0.23-0.25). Our estimates show that prediction R-sq. increases with sample size, reaching an estimated plateau at values that ranged from 0.1 to 0.37 for models using 500 and 50,000 (GWA-selected) SNPs, respectively. Soon much larger data sets will become available. Using the estimated surface response, we forecast that larger sample sizes will lead to further improvements in prediction R-sq. We conclude that big data will lead to a substantial reduction of the gap between trait heritability and the proportion of interindividual differences that can be explained with a genomic predictor. However, even with the power of big data, for complex traits we anticipate that the gap between prediction R-sq. and trait heritability will not be fully closed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Close Following Behavior: Estimation of Desired Gap Headway Using Loop Detector Data (TECHNICAL NOTE)

The desired gap headway of drivers, while close following, represents the main parameter in determining the following distance between vehicles.  This paper uses the raw individual vehicles data taken from loop detectors for millions of vehicles used M25 and M42 in order to estimate the gap headway distributions between successive pairs of vehicles.  The data used in this paper were filtered so...

متن کامل

Bridging the Gap between Statistical and Biological Epistasis in Alzheimer's Disease

Alzheimer's disease affects millions of people worldwide and incidence is expected to rise as the population ages, but no effective therapies exist despite decades of research and more than 20 known disease markers. Research has shown that Alzheimer's disease's missing heritability remains extensive with an estimated 25% of phenotypic variance unexplained by known variants. The missing heritabi...

متن کامل

The Human Microbiome and the Missing Heritability Problem

The "missing heritability" problem states that genetic variants in Genome-Wide Association Studies (GWAS) cannot completely explain the heritability of complex traits. Traditionally, the heritability of a phenotype is measured through familial studies using twins, siblings and other close relatives, making assumptions on the genetic similarities between them. When this heritability is compared ...

متن کامل

Epigenetic inheritance and the missing heritability problem.

Epigenetic phenomena, and in particular heritable epigenetic changes, or transgenerational effects, are the subject of much discussion in the current literature. This article presents a model of transgenerational epigenetic inheritance and explores the effect of epigenetic inheritance on the risk and recurrence risk of a complex disease. The model assumes that epigenetic modifications of the ge...

متن کامل

Narrowing the gap on heritability of common disease by direct estimation in case-control GWAS

paragraph One of the major developments in recent years in the search for missing heritability of human phenotypes is the adoption of linear mixed-effects models (LMMs) to estimate heritability due to genetic variants which are not significantly associated with the phenotype 1. A variant of the LMM approach has been adapted to case-control studies and applied to many major diseases 2–5 , succes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 207  شماره 

صفحات  -

تاریخ انتشار 2017